Personal Loan Campaign

Background and Context

AllLife Bank is a US bank that has a growing customer base. The majority of these customers are liability customers (depositors) with varying sizes of deposits. The number of customers who are also borrowers (asset customers) is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business and in the process, earn more through the interest on loans. In particular, the management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors).

A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio.

You as a Data scientist at AllLife bank have to build a model that will help the marketing department to identify the potential customers who have a higher probability of purchasing the loan.

Objective

To predict whether a liability customer will buy a personal loan or not. Which variables are most significant. Which segment of customers should be targeted more.

Data Dictionary

Loading Libraries

Exploratory Data Analysis on the data

Load data

View the first and last 5 rows of the dataset.

Understand the shape of the dataset.

Check the data types of the columns for the dataset.

Summary of the dataset.

Univariate Analysis

Observations

Observations

Bivariate analysis

Summary of EDA

Data Description:

Observations from EDA:

Data Pre-Processing

Data Preparation

Creating training and test sets.

Building the model

Model evaluation criterion

0 do not need loan and 1 needs loan

How to reduce reach out to more people but target precision for user targeting also avoid this loss by not giving loan in cases of recall?

First, let's create functions to calculate different metrics and confusion matrix so that we don't have to use the same code repeatedly for each model.

Logistic Regression

Finding the coefficients

Coefficient interpretations

Converting coefficients to odds

Odds from coefficients

Checking model performance on training set

Checking performance on test set

ROC-AUC

Model Performance Improvement

Optimal threshold using AUC-ROC curve

Checking model performance on training set

Checking model performance on test set

Let's use Precision-Recall curve and see if we can find a better threshold

Checking model performance on training set

Checking model performance on test set

Model Performance Summary

Finding which features are important?

Let's look at best 8 variables

Let's Look at model performance

Model Performance Summary

Conclusion

Recommendations

Model building and implementation

Decision Tree Classifier

Split Data

Build Decision Tree Model

We will build our model using the DecisionTreeClassifier function. Using default 'gini' criteria to split.

If the frequency of class A is 10% and the frequency of class B is 90%, then class B will become the dominant class and the decision tree will become bias. we start with random state 1

Visualizing the Decision Tree

Reducing over fitting

Using GridSearch for Hyperparameter tuning of our tree model

Model performance evaluation and improvement

Confusion Matrix - decision tree with tuned hyperparameters

Cost Complexity Pruning

Confusion Matrix - post-pruned decision tree

Decision tree with post-pruning is giving the highest recall on test set.

Summary:

Actionable Insights & Recommendations

Both models build have good f1 score, recall and maximized, Decision tree performs better

Campaign should target people with education, income family, CC average and CD account with higher values can be core target group for campaign of loan.

Thank you